Eliminating ghosting artifacts due to moving objects is a challenging problem in high dynamic range (HDR) imaging. In this letter, we present a hybrid model consisting of a convolutional encoder and a Transformer decoder to generate ghost-free HDR images. In the encoder, a context aggregation network and non-local attention block are adopted to optimize multi-scale features and capture both global and local dependencies of multiple low dynamic range (LDR) images. The decoder based on Swin Transformer is utilized to improve the reconstruction capability of the proposed model. Motivated by the phenomenal difference between the presence and absence of artifacts under the field of structure tensor (ST), we integrate the ST information of LDR images as auxiliary inputs of the network and use ST loss to further constrain artifacts. Different from previous approaches, our network is capable of processing an arbitrary number of input LDR images. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed method by comparing it with existing state-of-the-art HDR deghosting models. Codes are available at https://github.com/pandayuanyu/HSTHdr.
translated by 谷歌翻译
我们提出了CrossHuman,这是一种新颖的方法,该方法从参数人类模型和多帧RGB图像中学习了交叉指导,以实现高质量的3D人类重建。为了恢复几何细节和纹理,即使在无形区域中,我们设计了一个重建管道,结合了基于跟踪的方法和无跟踪方法。给定一个单眼RGB序列,我们在整个序列中跟踪参数人模型,与目标框架相对应的点(体素)被参数体运动扭曲为参考框架。在参数体的几何学先验和RGB序列的空间对齐特征的指导下,稳健隐式表面被融合。此外,将多帧变压器(MFT)和一个自我监管的经过修补模块集成到框架中,以放宽参数主体的要求并帮助处理非常松散的布。与以前的作品相比,我们的十字人类可以在可见的和无形区域启用高保真的几何细节和纹理,并提高人类重建的准确性,即使在估计的不准确的参数人类模型下也是如此。实验表明我们的方法达到了最新的(SOTA)性能。
translated by 谷歌翻译
没有标签的预处理分子表示模型是各种应用的基础。常规方法主要是处理2D分子图,并仅专注于2D任务,使其预验证的模型无法表征3D几何形状,因此对于下游3D任务有缺陷。在这项工作中,我们从完整而新颖的意义上处理了3D分子预处理。特别是,我们首先提议采用基于能量的模型作为预处理的骨干,该模型具有实现3D空间对称性的优点。然后,我们为力预测开发了节点级预处理损失,在此过程中,我们进一步利用了Riemann-Gaussian分布,以确保损失为E(3) - 不变,从而实现了更多的稳健性。此外,还利用了图形噪声量表预测任务,以进一步促进最终的性能。我们评估了从两个具有挑战性的3D基准:MD17和QM9的大规模3D数据集GEOM-QM9预测的模型。实验结果支持我们方法对当前最新预处理方法的更好疗效,并验证我们设计的有效性。
translated by 谷歌翻译
模拟/混合信号电路设计是整个芯片设计过程中最复杂,最耗时的阶段之一。由于芯片制造的各种过程,电压和温度(PVT)变化,模拟电路不可避免地会遭受性能降解。尽管在典型条件下自动化模拟电路设计方面已经有很多工作,但在探索在真实且不可预测的硅变化下探索可靠设计的研究有限。针对变化的自动模拟设计需要过度的计算和时间成本。为了应对挑战,我们提出了RobustanAlog,这是一个强大的电路设计框架,涉及优化过程中的变化信息。具体而言,不同变化下的电路优化被认为是一组任务。任务之间的相似之处是杠杆作用,并且可以缓解竞争以实现样本效率高的多任务培训。此外,Robustanalog根据每次迭代中当前的性能来修剪任务空间,从而导致进一步的模拟成本降低。这样,鲁棒可以迅速产生一组电路参数,这些电路参数满足各种变化的各种约束(例如增益,带宽,噪声...)。我们将Robustanalog与贝叶斯优化,进化算法和深层确定性策略梯度(DDPG)进行了比较,并证明Robustanalog可以将所需的优化时间显着减少14-30次。因此,我们的研究提供了一种处理各种真实硅条件的可行方法。
translated by 谷歌翻译
采用车辆到车辆通信以提高自动驾驶技术中的感知性能,最近引起了相当大的关注;然而,对于基准测试算法的合适开放数据集已经难以开发和评估合作感知技术。为此,我们介绍了用于车辆到车辆的第一个大型开放模拟数据集。它包含超过70个有趣的场景,11,464帧和232,913帧的注释3D车辆边界盒,从卡拉的8个城镇和洛杉矶的数码镇。然后,我们构建了一个全面的基准,共有16种实施模型来评估若干信息融合策略〜(即早期,晚期和中间融合),最先进的激光雷达检测算法。此外,我们提出了一种新的细心中间融合管线,以从多个连接的车辆汇总信息。我们的实验表明,拟议的管道可以很容易地与现有的3D LIDAR探测器集成,即使具有大的压缩速率也可以实现出色的性能。为了鼓励更多的研究人员来调查车辆到车辆的感知,我们将释放数据集,基准方法以及HTTPS://mobility-lab.seas.ucla.edu/opv2v2v/中的所有相关代码。
translated by 谷歌翻译
量子噪声是嘈杂中间级量子(NISQ)计算机中的关键挑战。以前的缓解噪声的工作主要集中在门级或脉冲级噪声自适应编译。然而,有限的研究工作通过使量子电路本身对噪声具有更高的优化级别。我们提出了Quoutumnas,是变分电路和量子位映射的噪声自适应共同搜索的全面框架。变形量子电路是构建QML和量子仿真的有希望的方法。然而,由于大型设计空间和参数训练成本,找到最佳变分电路及其最佳参数是具有挑战性的。我们建议通过引入新的超级速度来解耦电路搜索和参数培训。超电路由多层预定的参数化栅极构成,并通过迭代采样和更新其的参数子集(Subcircuit)训练。它提供了从头开始培训的子通差形性能的准确估计。然后我们执行Subcircuit的演进共同搜索和其量子位映射。使用从超级电路继承的参数和使用真实设备噪声模型进行估计,估计子电路性能。最后,我们执行迭代栅极修剪和FineTuning以去除冗余栅极。在10个量子计算上广泛评估了12个QML和VQE基准,Quoutumnas显着优于基线。对于QML,Quoutumnas是第一个展示超过95%的2级,85%的4级和真实QC的32%的10级分类准确性。与UCCSD相比,它还实现了H2,H2O,LIH,CH4,BEH2上的VQE任务的最低特征值。我们还开源Quantumengine(https://github.com/mit-han-lab/pytorch-quantum),用于快速训练参数化量子电路,以促进未来的研究。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译